Workdocumentation 2022-08-19
Jump to navigation
Jump to search
Participants
- Wolfgang
Agenda
- CEUR-WS Volume index.html fixes
- dblp
CEUR-WS Volume index.html fixes
wf@capri:/hd/luxio/CEUR-WS/www/Vol-457$ git diff 8285e6269493d1dd8d6dde06e4f4805fedb4d3f5 index.html
diff --git a/www/Vol-457/index.html b/www/Vol-457/index.html
index 1fef928a2..21cf5bed7 100644
--- a/www/Vol-457/index.html
+++ b/www/Vol-457/index.html
@@ -34,7 +34,7 @@ owners.</font></p>
<h1><a href="http://www.bgu.ac.il/~sturm/DE@CAiSE09/">DE@CAiSE'09</a><br>
-Domain Engineering
+Domain Engineering</h1>
<h3>Proceedings of the First International Workshop on Domain Engineering held in
conjunction with <A href="http://caise09.thenetworkinstitute.eu/index.php">CAiSE'09</a> Conference</h3>
@@ -91,4 +91,4 @@ Paul Johannesson<sup><font size=-1>2</font></sup>, Royal Institute of Technology
</body>
</html>
dblp
Import RDF Dump to QLever (39 min)
see Workdocumentation_2022-08-16#on_RWTH_Aachen_DBIS_i5_server for preparations
Steps with QLever Control script
Download and Indexing
wf@confident:/hd/torterra/dblp2022-08$ . ../qlever/qlever-control/qlever dblp
QLEVER CONFIG
Checking your PATH ...
Added the directory "/hd/torterra/qlever/qlever-control" to your PATH
Setting up bash autocompletion ...
Done, number of completions: 35
Creating new Qleverfile ...
Copied pre-configured Qleverfile for "dblp" into current directory.
Setup is complete
Type "qlever" and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
"qlever index show"). Typing "qlever" without arguments gives some basic help
and pointers for further help. Edit your local "Qleverfile" to change settings.
wf@confident:/hd/torterra/dblp2022-08$ qlever get-data
This is the "qlever" script, call without argument for help
Executing "get-data":
wget -nc -O dblp.nt.gz https://dblp.org/rdf/dblp.nt.gz
Getting data using GET_DATA_CMD from Qleverfile ...
--2022-08-19 07:16:17-- https://dblp.org/rdf/dblp.nt.gz
Resolving dblp.org (dblp.org)... 192.76.146.204
Connecting to dblp.org (dblp.org)|192.76.146.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2793364255 (2.6G) [application/x-gzip]
Saving to: ‘dblp.nt.gz’
dblp.nt.gz 100%[===================>] 2.60G 43.3MB/s in 64s
2022-08-19 07:17:21 (41.8 MB/s) - ‘dblp.nt.gz’ saved [2793364255/2793364255]
wf@confident:/hd/torterra/dblp2022-08$ qlever index
This is the "qlever" script, call without argument for help
Executing "index":
bash -c "zcat dblp.nt.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --words-from-literals | tee dblp.index-log.txt"
bash: IndexBuilderMain: command not found
wf@confident:/hd/torterra/dblp2022-08$
Max RAM usage: 0.0 GB
wf@confident:/hd/torterra/dblp2022-08$ ls
Qleverfile dblp.index-log.txt dblp.nt.gz dblp.settings.json
wf@confident:/hd/torterra/dblp2022-08$ vi Qleverfile
# modify USE_DOCKER = true
wf@confident:/hd/torterra/dblp2022-08$ qlever index
This is the "qlever" script, call without argument for help
Executing "index":
docker run -it --rm -u 1001:1001 -v /hd/torterra/dblp2022-08:/index -w /index --entrypoint bash --name qlever.dblp.index-build adfreiburg/qlever -c "zcat dblp.nt.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --words-from-literals | tee dblp.index-log.txt"
2022-08-19 05:19:12.735 - INFO: QLever IndexBuilder, compiled on Mon Aug 15 05:40:57 UTC 2022 using git hash 406dda
2022-08-19 05:19:12.736 - INFO: You specified the input format: TTL
2022-08-19 05:19:12.737 - INFO: Locale was not specified in settings file, default is en_US
2022-08-19 05:19:12.737 - INFO: You specified "locale = en_US" and "ignore-punctuation = 0"
2022-08-19 05:19:12.738 - INFO: You specified "ascii-prefixes-only = true", which enables faster parsing for well-behaved TTL files
2022-08-19 05:19:12.738 - INFO: You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory
2022-08-19 05:19:12.738 - INFO: Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
2022-08-19 05:19:12.738 - INFO: Processing input triples from /dev/stdin ...
2022-08-19 05:31:18.190 - INFO: Triples converted: 100,000,000
2022-08-19 05:31:36.447 - INFO: Triples converted: 200,000,000
2022-08-19 05:31:48.312 - INFO: Done, total number of triples converted: 268,701,236
2022-08-19 05:31:48.318 - INFO: Building prefix tree from internal vocabulary ...
2022-08-19 05:32:32.605 - INFO: Computing maximally compressing prefixes (greedy algorithm) ...
2022-08-19 05:33:59.130 - INFO: Reduction of size of internal vocabulary: 24%
2022-08-19 05:34:02.208 - INFO: Writing compressed vocabulary to disk ...
2022-08-19 05:35:42.396 - INFO: Creating a pair of index permutations ...
2022-08-19 05:37:03.671 - INFO: Statistics for PSO: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 05:37:03.674 - INFO: Statistics for POS: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 05:37:03.675 - INFO: Exchanging multiplicities for PSO and POS ...
2022-08-19 05:37:03.675 - INFO: Writing meta data for PSO and POS ...
2022-08-19 05:37:08.712 - INFO: Creating a pair of index permutations ...
2022-08-19 05:38:11.124 - INFO: Statistics for SPO: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 05:38:11.124 - INFO: Statistics for SOP: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 05:38:11.124 - INFO: Exchanging multiplicities for SPO and SOP ...
2022-08-19 05:38:21.281 - INFO: Writing meta data for SPO and SOP ...
2022-08-19 05:38:21.385 - INFO: Number of distinct patterns: 1,276
2022-08-19 05:38:21.385 - INFO: Number of subjects with pattern: 44,834,357 [all]
2022-08-19 05:38:21.385 - INFO: Total number of distinct subject-predicate pairs: 228,395,931
2022-08-19 05:38:21.385 - INFO: Average number of predicates per subject: 5.1
2022-08-19 05:38:21.389 - INFO: Average number of subjects per predicate: 3,625,332
2022-08-19 05:38:28.373 - INFO: Creating a pair of index permutations ...
2022-08-19 05:39:29.422 - INFO: Statistics for OSP: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 05:39:29.423 - INFO: Statistics for OPS: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 05:39:29.423 - INFO: Exchanging multiplicities for OSP and OPS ...
2022-08-19 05:39:48.764 - INFO: Writing meta data for OSP and OPS ...
2022-08-19 05:39:48.946 - INFO: Index build completed
2022-08-19 05:39:49.086 - INFO:
2022-08-19 05:39:49.086 - INFO: Adding text index ...
2022-08-19 05:39:49.086 - INFO: Considering each literal as a text record
2022-08-19 05:39:49.099 - INFO: The git hash used to build this index was "406ddab3953b604f7f37e83307b8c3db5a3c04dd"
2022-08-19 05:39:49.100 - INFO: Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-19 05:39:58.361 - INFO: Done, number of words: 92,096,717
2022-08-19 05:39:58.361 - INFO: Building text vocabulary ...
2022-08-19 05:41:07.506 - INFO: Writing vocabulary to file dblp.text.vocabulary ...
2022-08-19 05:41:07.592 - INFO: Done, number of words: 9,463,510
2022-08-19 05:41:07.896 - INFO: Building the half-inverted index lists ...
2022-08-19 05:46:10.425 - WARN: Entity from text not in KB: "James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419."
2022-08-19 05:47:50.808 - WARN: Entity from text not in KB: "Natasha Dobrinen: James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419. (2014)"
2022-08-19 05:49:30.949 - WARN: Entity from text not in KB: "Tony Owen: Numerical Recipes Book (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 1990, 759 pages including index (£30.00 hdb).Numerical Recipes Diskette (PASCAL) version 2.0 by William H. Press, et al. Cambridge University Press, Cambridge, 03 1990 (£21.50).Numerical Recipes Example Handbook (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 09 1990, 223 pages including index of demonstrated procedures (£19·50, hdb).Numerical Recipes Example Diskette (PASCAL) version 2.0 by William H. Press et al. Cambridge University Press, Cambridge, 02 1990 (£21.50).Numerical Recipes Routines and Examples in Basic by Julian C. Sprott Cambridge University Press, Cambridge (paperback), 1991, 398 pages including index of programs (£19.50; pbk).Numerical Recipes Diskette Basic version 1.0 by Julian C. Sprott Cambridge University Press, Cambridge, 1991 (£21.50). (1992)"
2022-08-19 05:50:15.628 - WARN: Number of mentions of entities not found in the vocabulary: 3
2022-08-19 05:55:07.011 - INFO: Statistics for text index: #records = 32,052,337, #words = 256,962,549, #entities = 32,052,337, #blocks = 32,279,050
2022-08-19 05:55:12.745 - INFO: Text index build completed
Server Start
qlever start
This is the "qlever" script, call without argument for help
Executing "start":
docker run -d --restart unless-stopped -u 1001:1001 -it -v /hd/torterra/qlever/dblp:/index -p 7015:7015 -w /index --entrypoint bash --name qlever.dblp adfreiburg/qlever -c "ServerMain -i dblp -j 8 -p 7015 -m 20 -c 5 -e 1 -k 100 -a \"dblp_620614028\" -t > dblp.server-log.txt" > /dev/null
Starting the QLever server in the background and waiting until it's ready (Ctrl+C will not kill it) ...
2022-08-19 06:02:25.290 - INFO: QLever Server, compiled on Mon Aug 15 05:40:57 UTC 2022 using git hash 406dda
2022-08-19 06:02:25.294 - INFO: Initializing server ...
2022-08-19 06:02:25.297 - INFO: The git hash used to build this index was "406ddab3953b604f7f37e83307b8c3db5a3c04dd"
2022-08-19 06:02:25.298 - INFO: Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-19 06:02:33.264 - INFO: Done, number of words: 92,096,717
2022-08-19 06:02:33.266 - INFO: Registered PSO permutation: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 06:02:33.267 - INFO: Registered POS permutation: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 06:02:33.268 - INFO: Registered OPS permutation: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 06:02:33.269 - INFO: Registered OSP permutation: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 06:02:33.270 - INFO: Registered SPO permutation: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 06:02:33.270 - INFO: Registered SOP permutation: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 06:02:33.270 - INFO: Reading patterns from file dblp.index.patterns ...
2022-08-19 06:02:34.049 - INFO: Reading vocabulary from file dblp.text.vocabulary ...
2022-08-19 06:02:34.424 - INFO: Done, number of words: 9,463,510
2022-08-19 06:02:34.424 - INFO: Reading metadata from file dblp.text.index ...
2022-08-19 06:02:36.068 - INFO: Registered text index: #records = 32,052,337, #words = 256,962,549, #entities = 32,052,337, #blocks = 32,279,050
2022-08-19 06:02:36.232 - INFO: Sorting random result tables to estimate the sorting performance of this machine ...
2022-08-19 06:02:37.124 - INFO: Access token for restricted API calls is "****"
2022-08-19 06:02:37.124 - INFO: The server is ready, listening for requests on port 7015 ...
2022-08-19 06:02:37.438 - INFO:
2022-08-19 06:02:37.438 - INFO: Request received via GET, no content type specified
2022-08-19 06:02:37.438 - INFO: Alive check with message "from the qlever script"
2022-08-19 06:02:37.451 - INFO:
2022-08-19 06:02:37.451 - INFO: Request received via GET, no content type specified
2022-08-19 06:02:37.451 - INFO: Setting index description to: "RDF from https://dblp.org/rdf/dblp.nt.gz, version from 19.08.2022 01:33"
2022-08-19 06:02:37.463 - INFO:
2022-08-19 06:02:37.463 - INFO: Request received via GET, no content type specified
2022-08-19 06:02:37.463 - INFO: Setting text description to: "All literals, search with FILTER CONTAINS(?var, "...")"
Test Queries
see https://dblp.org/rdf/schema.nt
classHistogramm
sparqlquery -qp ./queries.yaml -qn classHistogramm -en dblp -f mediawiki
query
SELECT ?c (COUNT(?c) AS ?count)
WHERE {
?subject a ?c
}
GROUP BY ?c
HAVING (?count >100)
ORDER BY DESC(?count)
result
propertyHistogramm
sparqlquery -qp ./queries.yaml -qn propertyHistogramm -en dblp -f mediawiki
query
SELECT ?property (COUNT(?property) AS ?propTotal)
WHERE { ?s ?property ?o . }
GROUP BY ?property
HAVING (?propTotal >1000)
ORDER BY DESC(?propTotal)
result
CEUR-WS Papercount
sparqlquery -en dblp -qp ./dblp.yaml -qn "CEUR-WS Papercount" -f mediawiki
query
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (COUNT(?paper) as ?count)
WHERE {
?proceeding dblp:publishedIn "CEUR Workshop Proceedings".
?paper dblp:publishedAsPartOf ?proceeding.
}
result
count |
---|
45158 |
CEUR-WS Counts
sparqlquery -en dblp -qp ./dblp.yaml -qn "CEUR-WS Counts" -f mediawiki
query
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (COUNT(DISTINCT ?author) as ?numberOfAuthors)
(COUNT(DISTINCT ?paper) as ?numberOfPapers)
(COUNT(DISTINCT ?editor) as ?numberOfEditors)
(COUNT(DISTINCT ?proceeding) as ?numberOfVolumes)
WHERE {
?proceeding dblp:publishedIn "CEUR Workshop Proceedings".
OPTIONAL{?proceeding dblp:editedBy ?editor}
OPTIONAL{
?paper dblp:publishedAsPartOf ?proceeding.
OPTIONAL{?paper dblp:authoredBy ?author}
}
}
result
numberOfAuthors | numberOfPapers | numberOfEditors | numberOfVolumes |
---|---|---|---|
71260 | 45158 | 4665 | 2399 |